Backend / Performance / Memory Allocation Discipline Example

Memory Allocation Discipline Example

5 min read

Senior4 min read

Rapid overview

Memory Allocation Discipline Example
TL;DR
How it works
📄 `TickParsingBenchmarks.cs`
⚙️ Run it:
🧾 Expected results (typical output):
📄 `TickProcessor.cs`
💡 Key improvements explained
🧩 Memory profile
🧠 Discussion points for your interview
✅ Pro tip
Quick recall Q&A

Memory Allocation Discipline Example

TL;DR

“I design for allocation discipline — especially in tight loops. For example, in our tick processor, we rent buffers from ArrayPool<T>, parse with Span<byte> and Utf8Parser to avoid string and array allocations, and use small readonly structs for data. That keeps all transient data in Gen 0 and prevents Gen 2 pressure or LOH fragmentation. In load tests, we confirmed negligible GC activity and stable latency even at millions of ticks per second.”

How it works

This microbenchmark compares two implementations of tick parsing:

Naive: uses string.Split() and double.Parse()
Optimized: uses Span<byte> + Utf8Parser (zero allocations)

📄 `TickParsingBenchmarks.cs`

using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser] // shows allocations in bytes per operation
public class TickParsingBenchmarks
{
    private readonly string tickLine = "EURUSD,1.07432,1.07436";

    [Benchmark(Baseline = true)]
    public (string, double, double) NaiveParse()
    {
        var parts = tickLine.Split(',');
        var symbol = parts[0];
        var bid = double.Parse(parts[1]);
        var ask = double.Parse(parts[2]);
        return (symbol, bid, ask);
    }

    [Benchmark]
    public (string, double, double) SpanParse()
    {
        ReadOnlySpan<byte> span = Encoding.ASCII.GetBytes(tickLine);

        int firstComma = span.IndexOf((byte)',');
        int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;

        string symbol = Encoding.ASCII.GetString(span[..firstComma]);
        Utf8Parser.TryParse(span[(firstComma + 1)..secondComma], out double bid, out _);
        Utf8Parser.TryParse(span[(secondComma + 1)..], out double ask, out _);

        return (symbol, bid, ask);
    }

    public static void Main() => BenchmarkRunner.Run<TickParsingBenchmarks>();
}

⚙️ Run it:

dotnet add package BenchmarkDotNet
dotnet run -c Release

🧾 Expected results (typical output):

|    Method |       Mean |   Allocated |
|----------- |-----------:|------------:|
| NaiveParse |   1.200 μs |     1.24 KB |
| SpanParse  |   0.245 μs |       32 B  |

💡 Interpretation:

The optimized version is ~5× faster.
It reduces allocations from ~1.2 KB → ~32 bytes per tick.
Over 1M ticks/sec, that’s ~1.2 GB less allocation per second 🤯 — huge difference for a trading backend.

Now let’s build a GC-efficient Tick parser — something you can confidently mention if they ask, “How would you design a real-time price feed handler?”

📄 `TickProcessor.cs`

using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;

public readonly struct Tick
{
    public string Symbol { get; }
    public double Bid { get; }
    public double Ask { get; }

    public Tick(string symbol, double bid, double ask)
    {
        Symbol = symbol;
        Bid = bid;
        Ask = ask;
    }

    public override string ToString() => $"{Symbol}: {Bid:F5}/{Ask:F5}";
}

public class TickProcessor
{
    private readonly ArrayPool<byte> _bufferPool = ArrayPool<byte>.Shared;

    public void ProcessBatch(string[] rawTicks)
    {
        foreach (var tickStr in rawTicks)
        {
            // Rent a buffer (to avoid allocating new byte[] each time)
            var buffer = _bufferPool.Rent(256);
            try
            {
                int bytesWritten = Encoding.ASCII.GetBytes(tickStr, buffer);
                var span = new ReadOnlySpan<byte>(buffer, 0, bytesWritten);

                var tick = ParseTick(span);
                OnTick(tick);
            }
            finally
            {
                _bufferPool.Return(buffer);
            }
        }
    }

    private static Tick ParseTick(ReadOnlySpan<byte> span)
    {
        // EURUSD,1.07432,1.07436
        int firstComma = span.IndexOf((byte)',');
        int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;

        string symbol = Encoding.ASCII.GetString(span[..firstComma]);
        Utf8Parser.TryParse(span[(firstComma + 1)..secondComma], out double bid, out _);
        Utf8Parser.TryParse(span[(secondComma + 1)..], out double ask, out _);

        return new Tick(symbol, bid, ask);
    }

    private void OnTick(in Tick tick)
    {
        // Simulate publishing or processing the tick
        Console.WriteLine(tick);
    }
}

public static class Program
{
    public static void Main()
    {
        var ticks = new[]
        {
            "EURUSD,1.07432,1.07436",
            "GBPUSD,1.24587,1.24592",
            "USDJPY,151.229,151.238",
        };

        var processor = new TickProcessor();
        processor.ProcessBatch(ticks);
    }
}

💡 Key improvements explained

Improvement	Why it matters
`ArrayPool<byte>.Shared`	Reuses buffers, avoids LOH churn
`ReadOnlySpan<byte>`	Zero-copy slicing of incoming data
`Utf8Parser`	Parses numeric values directly from bytes (no string allocations)
`readonly struct Tick`	Stack-friendly immutable type, no GC tracking
`in Tick` (if used)	Passes struct by ref → no copying

🧩 Memory profile

✅ Only one small string allocation per tick (Symbol)
✅ No arrays or temporary strings per line
✅ All other memory reused via pool
✅ Negligible GC activity — steady-state latency

🧠 Discussion points for your interview

When asked “How do you ensure your system stays fast under high load?” — say:

✅ Pro tip

You can mention:

“In production, I monitor dotnet-counters — if Gen 2 GC Count increases, that’s a red flag that something’s allocating too much. Then I use dotnet-trace or dotMemory to find the source.”

Would you like me to extend this by showing the async version — i.e., reading ticks from a NetworkStream using System.IO.Pipelines (zero-copy streaming, ideal for high-throughput trading systems)? That’s exactly the kind of system might ask you to describe.

Quick recall Q&A

Q: What does the benchmark prove when comparing Split vs Span parsing?

It shows the optimized implementation is faster and uses dramatically fewer allocations (tens of bytes vs kilobytes per tick). That reduction scales to gigabytes saved per second in production.

Q: Why is Utf8Parser preferred over double.Parse here?

Utf8Parser operates directly on byte spans, avoiding string allocations and culture-dependent parsing. It’s ideal for fixed-format protocols and keeps parsing allocation-free.

Q: How does renting buffers from ArrayPool<byte> help batch processing?

Each tick lines uses the same reusable buffer instead of creating a new byte array. Returning the buffer keeps the LOH clean and ensures steady-state memory usage regardless of batch size.

Q: Why make Tick a readonly struct?

It keeps the data inline, prevents accidental mutation, and avoids heap allocations when passing ticks around. Combined with in Tick parameters, we avoid copies even for frequent calls.

Q: What’s the benefit of in Tick on the OnTick method?

It passes the struct by readonly reference, eliminating defensive copies for large structs and preserving immutability guarantees without GC cost.

Q: How would you extend this pattern for multi-threaded processing?

Use channels or System.Threading.Channels to fan out parsed ticks, but keep parsed structs allocation-free. Each consumer should reuse buffers or work with spans until serialization boundaries.

Q: How do you verify there are no hidden allocations?

Run the benchmark with MemoryDiagnoser, inspect ETW events, or instrument code with GC.GetAllocatedBytesForCurrentThread() to ensure the optimized method stays within expected allocation budgets.

Q: What happens if you forget to return buffers to the pool?

The pool will grow and eventually allocate new arrays, defeating the purpose and potentially causing memory leaks. Always return inside finally blocks to ensure deterministic cleanup.

Q: How can you adapt this sample for binary protocols?

Replace ASCII parsing with direct span slicing over binary fields, using BinaryPrimitives or custom parsing logic; the same pooling and span principles apply.

Q: How do you integrate this with logging or metrics without reintroducing allocations?

Emit structured logs with message templates, avoid string concatenation, and aggregate metrics using counters/gauges. When necessary, log summaries rather than per-tick details to keep the hot path clean.

Performance Study Portal